Subtopic Structuring fbr l?ull-Length Document Access
نویسندگان
چکیده
We argue that the advent of large volumes (of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Towamd this end, we discuss the merits of imposing structure on fulllength text documents; that is, a partition of t’he text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.
منابع مشابه
University of Glasgow at the NTCIR-9 Intent task: Experiments with Terrier on Subtopic Mining and Document Ranking
We describe our participation in the subtopic mining and document ranking subtasks of the NTCIR-9 Intent task, for both Chinese and Japanese languages. In the subtopic mining subtask, we experiment with a novel data-driven approach for ranking reformulations of an ambiguous query. In the document ranking subtask, we deploy our state-ofthe-art xQuAD framework for search result diversification.
متن کاملFull discrimination of subtopics in search results with keyphrase-based clustering
We consider the problem of retrieving multiple documents relevant to the single subtopics of a given web query, termed “full-subtopic retrieval”. To solve this problem we present a novel search results clustering algorithm that generates clusters labeled by keyphrases. The keyphrases are extracted from the generalized suffix tree built from the search results and merged through an improved hier...
متن کاملTHUSAM at NTCIR-11 IMine Task
This paper describes our approaches and results in NTCIR11 IMine task. In 2014, we participate in subtasks for Chinese/English Subtopic Mining and Chinese Document Ranking. In Subtopic Mining subtask, we mine subtopic candidates from various resources and construct the subtopic hierarchy with several different strategies. In Document Ranking subtask, we rerank the result lists with HITS algorit...
متن کاملNTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task
Users express their information needs in terms of queries to find the relevant documents on the web. However, users’ queries are usually short, so that search engines may not have enough information to determine their exact intents. How to diversify web search results to cover users’ possible intents as wide as possible is an important research issue. In this paper, we will propose several subt...
متن کاملHITSZ-ICRC at NTCIR-12 Temporal Information Access Task
This paper presents the methods HITSZ-ICRC group used to Temporalia-2 task at NTCIR-12, including subtask Temporal Intent Disambiguation (TID) and subtask Temporal Diversified Retrieval (TDR). In the TID subtask, we merged results of rule based method and word temporal intent classes vector based method to estimate temporal intent classes distribution on English queries and Chinese queries. The...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993